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DENSITIES, SPECTRAL DENSITIES AND MODALITY 1 

By P. Laurie Davies and Arne Kovac 

Universitat Duisburg-Essen 

This paper considers the problem of specifying a simple approx- 
imating density function for a given data set (xi, . . . ,x n ). Simplicity 
is measured by the number of modes but several different definitions 
of approximation are introduced. The taut string method is used to 
control the numbers of modes and to produce candidate approxi- 
mating densities. Refinements are introduced that improve the local 
adaptivity of the procedures and the method is extended to spectral 
densities. 

1. Contents. In Section 1.1 we formulate the density problem in terms 
of obtaining the simplest density which is an adequate approximation for 
the given data. The taut string method of Davies and Kovac (2001) is 
adapted to the density problem and is used for producing candidate den- 
sities of increasing complexity. The difficulties of the density problem are 
discussed in Section 2. Section 3 contains a more detailed account of the ap- 
plication of the taut string method to the density problem. The asymptotics 
of the procedure on appropriate test beds are discussed in Section 4. A refine- 
ment based on cell occupancy frequencies which increases local sensitivity 
is described in Section 5. Section 5.4 compares the taut string method with 
kernel estimators in a small simulation study. Finally, Section 6 describes 
the application of the taut string methodology to the problem of spectral 
densities. 

1.1. The density problem. Given a sample x n = (x±, . . . ,x n ) of size n, 
we consider the problem of specifying a distribution F with the smallest 
number of modes such that the resulting model of i.i.d. random variables 
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X„ = (Xf , . . . , X£ ) with common distribution F is an adequate approxi- 
mation for the data x n . 

We use different concepts of approximation, one of which is the following. 
Let E n , 

1 n 

E n (x) = - V{xj < x}, 
n f— * 

denote the empirical distribution of the data x n and F n the empirical distri- 
bution function of n i.i.d. random variables with common distribution 
F. The Kolmogorov metric c?ko is defined by 

d KO (F,G)=sup{x:\F(x)-G(x)\}. 

The i.i.d. model with distribution F will be regarded as an adequate ap- 
proximation to the data x n if 

(1.1) d K o(E n ,F) < qu(n,a,d K o), 

where qu(n, a, d^o) denotes the a-quantile of the random variable dKo(Fn, F) 
which is independent of F for continuous F. This gives rise to the Kol- 
mogorov problem. 

Problem 1.1 (Kolmogorov problem). Determine the smallest integer k n 
for which there exists a density f n with k n modes and whose distribution 
F n satisfies 

(1.2) d KO (E n ,F n )<qu{n,a,d KO ). 

We note that the problem is well posed: for any data set x n it has a 
solution. We have posed the problem in terms of approximation so that no 
assumptions regarding the "true" data generating mechanism are required 
or made. 

The Problem 1 . 1 is formulated in terms of the smallest number of modes 
required for an adequate approximation. A detailed theoretical discussion of 
such one-sided problems is given by Donoho (1988); one of his examples is 
that of modality of nonparametric densities and spectral densities. His paper 
also raises interesting questions about statistical inference involving objects 
whose very existence cannot be shown, an example being the "underlying 
density" for the data. We avoid such problems by phrasing the paper in 
terms of approximation. 

Hartigan and Hartigan (1985) and Hartigan (2000) construct tests for 
the modality of a density function. They are based on the Kolmogorov dis- 
tance of the nearest mixture of uniform distributions to the data and are 
discussed in more detail below. 
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Hengartner and Stark (1995) also make use of the Kolmogorov ball to 
determine nonparametric confidence bounds for densities subject to an up- 
per bound for the number of modes. In the particular case of monotone 
or unimodal densities the width of their bounds on appropriate test beds 
is (logn/ji) 1 / 3 , which agrees with the results given in this paper. It seems 
that their bounds become difficult to calculate for more than one mode as 
the complexity is given as (") where I is the number of local extremes. The 
main differences with respect to the work of Hengartner and Stark (1995) 
are as follows: 

(i) we provide an explicit density but no bounds, 

(ii) neither the number of modes nor even an upper bound is specified in 
advance, 

(iii) the algorithmic complexity of our method is 0(n) independently of 
the number of modes. 

1.2. The taut string methodology. The basic methodology we use for 
producing densities is the taut string methodology. Taut strings were first 
used in the context of monotonic regression: the greatest convex mino- 
rant of the integrated data is a taut string and its derivative is precisely 
the monotone increasing least squares approximation. This is described in 
Barlow, Bartholomew, Bremner and Brunk (1972), who were the first to 
use the phrase "taut string." We refer also to Leurgans (1982). The first 
use of the taut string which goes beyond the monotone case and which ex- 
plicitly deals with modality is in Hartigan and Hartigan (1985), where it 
is referred to as the "stretched string." Hartigan and Hartigan (1985) in- 
troduced their DIP test for unimodality which is based on the closest (in 
the Kolmogorov metric) unimodal distribution to the empirical distribution 
function of the data. Based on the work of Hartigan and Hartigan (1985), 
Davies (1995) used the taut string method to produce candidate densities 
of low modality to approximate data. Mammen and van de Geer (1997) 
employed the taut string in the nonparametric regression problem. They 
considered a penalized least squares problem where the penalty is the to- 
tal variation of the approximating function. The solution is the basic taut 
string confined to a tube centered at the integrated data. Mammen and 
van de Geer (1997) gave a detailed description of the taut string but did 
not mention the connection with modality. Hartigan (2000) recently pro- 
posed a generalization of the DIP test to an arbitrary number of modes. 
It is based on the Kolmogorov distance between the empirical distribution 
and the nearest distribution consisting of a mixture of uniform distributions 
with at most m modes. This is calculated using a taut string. Hartigan 
examines for each antimode of a taut string approximation the supremum 
distance between the empirical distribution function and a monotone den- 
sity on a "shoulder interval" including the antimode. Finally, Davies and 
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Kovac (2001) used the taut string methodology to control the number of 
local extremes of a nonpar ametric approximation to a data set. They also 
introduced the idea of local squeezing and residual driven tube widths, which 
greatly increases the precision and flexibility of the taut string methodology 

1.3. Smoothness. The taut string methodology produces densities which 
are piecewise constant and therefore not even continuous. Smoothness will 
not be a consideration in this paper but we point out that techniques 
for smoothing such functions have been developed. The idea is to obtain 
the smoothest density subject to shape and deviation constraints taken from 
the taut string. We refer to Metzner (1997), Lowendick and Davies (1998) 
and Majidi (2003). 

1.4. Previous work. Much work has been done on the problem of density 
estimation. One of the most popular methods is that of kernel smoothing. 
We refer to Watson (1964), Nadaraya (1965), Silverman (1986), Sheather 
and Jones (1991), Wand and Jones (1995), Sain and Scott (1996) and Si- 
monoff (1996) and the references given therein. The main problem here 
is the determination of appropriate global or local bandwidths. A further 
approach is based on wavelets. We refer to Donoho, Johnstone, Kerky- 
acharian and Picard (1996), Herrick, Nason and Silverman (2000) and Vi- 
dakovic (1999), Chapter 7. Mixtures of densities have been considered in 
the Bayesian framework by Richardson and Green (1997) and Roeder and 
Wasserman (1997). Other Bayesian methods are to be found in Verdinelli 
and Wasserman (1998). 

None of the above approaches is directly concerned with modality. For ex- 
ample, the non-Bayesian theory is generally based on integrated squared er- 
ror or some similar loss function. In spite of this, methods are often judged 
by their ability to identify peaks in the data as in Loader (1999) and Her- 
rick, Nason and Silverman (2000). Work directly concerned with modality 
has been done by Miiller and Sawitzki (1991) using their concept of excess 
mass. Their ideas have been extended to multidimensional distributions by 
Polonik (1995a, b, 1999). Hengartner and Stark (1995) used the Kolmogorov 
ball centered at the empirical distribution function to obtain nonparametric 
confidence bounds for shape restricted densities. Another way of controlling 
modality is that of mode testing. We refer to Good and Gaskins (1980), 
Silverman (1986), Hartigan and Hartigan (1985) and Fisher, Mammen and 
Marron (1994). 

2. The difficulties of the density problem. Obtaining adequate approx- 
imate densities is a special case of nonparametric regression. Whereas non- 
parametric regression is usually concerned with the size of the dependent 
variable, the density problem is concerned with measuring the degree of 
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closeness of the design points. In spite of a formal similarity, this is the more 
difficult problem and it may explain the modesty evident in the literature 
on densities. The difficulties may be illustrated by three data sets each of 
a sample size of n = 500. The first was generated using the standard normal 
distribution, the second using the uniform distribution on [0, 1] and the third 
using the so-called claw distribution which is the following mixture of five 
normal distributions: 

4 

0.5Af(0,l) +0.1^ M{i/2- 1,0.1). 

i=0 

This density will also be referred to as N5 (see Section 3.1). It is one of ten 
introduced by Marron and Wand (1992) to study the performance of differ- 
ent density methods. For each data set we calculated a kernel estimate with 
a global bandwidth which was chosen to be as small as possible subject to 
the estimate having the same modality as the density. Similarly for the taut 
string method we took the Kolmogorov ball to be as small as possible sub- 
ject to the estimate having the same modality as the density. The results 
are shown in Figure 1. 

The kernel method performs very well on the sample from the normal dis- 
tribution but the approximation to the uniform density is poor. It can only 
be improved by using a smaller bandwidth which then introduces superflu- 
ous modes. The approximation to the claw density is even worse. Only three 
peaks are correctly identified; the remaining two peaks are in the tails near 
—2 and 3, where the claw density does not have a peak. An explanation of 
this behavior can be found in Hartigan (2000), who discusses the relationship 
between the peaks and bandwidth for kernel estimates. 

The taut string method produces excellent approximations in all three 
cases. In particular, all five peaks of the claw density are correctly identified. 
The open problem is to produce an automatic procedure for the taut string 
method which will give good approximations on these and other test beds 
without knowledge of the number of modes. In the case of nonparametric 
regression such an automatic procedure is available and is reminiscent of 
hard thresholding for wavelets [Davies and Kovac (2001)]. Unfortunately, 
there seems to be no equivalent for densities and it is this which makes 
the density problem so difficult. 

3. Taut strings, Kuiper metrics and densities. 

3.1. Test densities. As part of the evaluation of the procedures to be 
defined below, we consider their performance on test beds defined by distri- 
butions. For the sake of convenient reference we list here the distributions 
we consider. M(fj,,a 2 ) refers to the normal distribution with mean /i and 
variance a 2 . 
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Fig. 1. Normal, uniform and claw density. The panels show kernel and taut string ap- 
proximations using the smallest bandwidth that retains the correct modality. 
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u 


the uniform distribution on [0, 1] 


Nl 


the standard normal distribution 


S 


the slash distribution, defined as AA(0, 1)/U(0, 1) 




[see Morgenthaler and Tukey (1991)] 


N2 


the mixture 0.5AA(0, 1) + 0.5AA(3, 1) 


NA 


the mixture 0.8AA(0, 3) + 0.015AA(8, 0.02) + 0.0157^(9, 0.02) + 




0.17^(15,0.2) 


N5 


the claw distribution 0.5AA(0, 1) + 0.1 Ei=o-^(V 2 - 1, 0-1) 


iV10_5 


the mixture 0.l£]£ 1 .A/'(5i - 5, 1) 


JV10_10 


the mixture 0.1Ei£iA/"(10i - 5, 1) 



3.2. Taut strings. We give a short description of the taut string method. 
A thorough analysis of properties of the taut string can be found in Hartigan 
(2000). Further details and an algorithm of complexity 0(n) are given by 
Davies and Kovac (2001). 

Consider a sample x n and form the ordered sample X( n ) = (xn\, . . . ,xr n \). 
For a given e > we consider the Kolmogorov tube T(E n ,e) centered at 
the empirical distribution E n and of radius e > 

T(E n ,e) = S.G-.G monotone sup|G(x) - E n (x)\ < e|. 

Imagine now a taut string taking the value of at xr\^ and 1 at xr n \ 
and constrained to lie within the Kolmogorov tube. Such a string is shown 
in the right panels of Figure 2 for two different values of e. The taut string 
defines a function S n on the interval [im^u]. Although S n depends on E n 
and e, we suppress this dependency to relieve the burden on the notation. We 
denote the density of S n by s n . It is defined as the left-hand side derivative 
of S n except at the smallest data point xry\ where we use the right-hand 
side derivative. The left panels of Figure 2 show histograms of the data with 
the corresponding densities s n superimposed. 

The taut string is a spline with knots at the points at which it touches 
the lower or upper boundaries of the Kolmogorov tube. The taut string has 
the following properties [see Davies and Kovac (2001) and Mammen and van 
de Geer (1997)]: 

(i) S n is monotonic increasing and linear between knots. 

(ii) s n is nonnegative and piecewise constant between knots. 

(iii) s n has the minimum modality of all functions whose integral over 
[x(!),X( n )] lies in T(E n ,e) and satisfies the end point conditions. 

(iv) S n switches from the upper boundary E n + e to the lower boundary 
E n — e at points where s n has a local maximum. 

(v) S n switches from the lower boundary E n — e to the upper boundary 
E n + e at points where s n has a local minimum. 
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Histogram of x 



y 



— 



3fl 



— 




Histogram of x 




Fig. 2. These figures illustrate the taut string method applied to a sample of mixture of 
normal distributions with two different tubewidths. The right column shows the tubes and 
the taut strings while the left column shows histograms of the data and the corresponding 
densities of the taut string. 



(vi) If £j and are consecutive knots on the same boundary, then on 
the interval 

(3-D s n ( X ) = l^<^WI , 

It is property (iii) which is of importance and allows control of the number 
of modes. If consecutive knots £j and are on opposite boundaries, then 
it follows from (iv) and (v) above that (3.1) must be replaced by 

(3.2) ^ <x,< & 0| ± 2e 

with a minus sign at local maxima and a plus sign at local minima. This 
means that the derivative underestimates local maxima and overestimates 
local minima. In an earlier version of this paper we followed Davies and 
Kovac (2001) and modified string S n by setting 

(3.3) S n (^j) = E n (£j) at all knots £j 
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and linear in between. The corresponding derivative s n satisfies 

(3.4) Sn(£j) = — Xl ~5 J \ +1 ^ between the knots and Cj+i- 

This modification has no effect on the modality and in general produces more 
pronounced peaks. More by good luck than by good thinking, the authors 
fortunately noticed that much improved results can be obtained by not mod- 
ifying the taut string in this manner. The reason is that this alteration causes 
both the taut string and the empirical distribution to have the same mass on 
intervals defining local extremes. Below we shall use Kuiper metrics which 
are defined by those intervals where the difference is greatest. The idea is 
that differences in distributions with different peaks should be greatest on 
intervals defining peaks. Modifying the taut string as in (3.4) nullifies this 
effect. Nevertheless, the final density, which is returned by the procedure, is 
modified in this manner. 

3.3. Data analysis. Even without an automatic procedure, the taut string 
can be used as a data analytical tool. If the radius of the Kolmogorov tube 
is monotonically decreased, then the number of modes of the derivative of 
the taut string increases monotonically. It is therefore possible to specify 
the number of modes of the approximate density. Figure 3 shows this for 
the same sample as used for Figure 1. The densities of Figure 3 can be in- 
terpreted as histograms with an automatic choice of the number of bins and 
the bin widths. To measure the performance of the taut string procedure, we 
simulated samples of different sizes from the claw distribution and squeezed 
the tube as far as possible consistent with the density having five peaks. A 
peak is classified as being correctly identified if the midpoint of the interval 
defining a peak differs by less than 0.15 from the position of the nearest 
peak of the claw density. Figure 4 shows the number of correctly identified 
peaks as a function of sample size. 

It shows that the taut string method is extremely good at finding peaks. 
For samples of size 200, the five peaks will be correctly identified in over 
80% of the cases. This in a sense confirms Loader (1999), who, on the basis 
of theoretical results of Marron and Wand (1992), claims that for samples 
of size n = 193 the claws should be detectable. The problem we now ad- 
dress is the difficult one of defining an automatic procedure with a similar 
performance. 

3.4. An automatic procedure. The following theorem is an immediate 
consequence of the properties of the taut string listed above. 

Theorem 3.1. The derivative s n of the taut string constrained to lie in 
the tube T(E n ,qu(n,a,dKo)) is a solution of the Kolmogorov density prob- 
lem. 
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Fig. 3. Six taut string estimates of a sample of the claw distribution with increasing 
number of modes. 
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Fig. 4. Five-modal taut string: number of correctly identified peaks of the claw density 
as a function of sample size. 



For finite n the values of qu(n,a,d,Ko) can be obtained by simulation. In 
the limit i/nqu(n, a, c^ko) tends to the corresponding quantile of 

max Bn(t) — min Bn(t), 
o<t<i o<t<i 

where -Bo denotes a Brownian bridge and for which an explicit expression 
exists [Dudley (1989)]. 

The solution of the Kolmogorov density problem therefore defines an au- 
tomatic procedure based on the taut string and its performance can be 
evaluated on different test beds. If we do this on an i.i.d. test bed, that is, 
with data of the form X±(F), . . . , X n {F) where F has a k- modal density 
function /, then it is clear that the taut string density s n will have at most 
k modes with probability at least a. This follows on noting that F lies in 
the tube with probability a and that in this case s n has at most as many 
modes as /. In particular, if k = 1, we have the following. 

Theorem 3.2. Let X\(F) , . . . , X n (F) be an i.i.d. sample with common 
unimodal distribution F and let s n be the solution of the Kolmogorov density 
problem (1.2). Then 



(3.5) 



P(s n unimodal) > a. 
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A simulation was performed to investigate the performance of the proce- 
dure with a = 0.9 and the corresponding tube width 1.245/- v /n on test beds 
defined by the distributions listed in Section 3.1. The results are shown in 
Table 1. It is clear that for a unimodal distribution the modality is correctly 
estimated with probability at least 0.9 in accordance with Theorem 3.2. In- 
deed the actual probability greatly exceeds 0.9 as all simulations resulted 
in exactly one peak. The results for the other distributions are, in contrast, 
disappointing. Asymptotically the modality will be correctly estimated with 
probability at least 0.9 but the rate of convergence is clearly very slow. We 
now try and obtain an improved procedure in two ways. First we note that 
the choice of qu(n, a, dxo) f° r the radius of the tube means that a probabil- 
ity of at least a is guaranteed for all unimodal test beds. If we provisionally 
accept that the uniform distribution is a poor model for most data sets, 
then we may accept a worse performance for the uniform distribution in 
return for enhanced performances for other distributions. Silverman (1986) 
and Miiller and Sawitzki (1991) argue in a similar vein. The second way 
of gaining an improved performance is to use a generalized Kuiper metric 
rather than the Kolmogorov metric. Kuiper metrics consider the differences 
in probability over a fixed number of disjoint intervals and are therefore 
better at detecting modality. 

3.5. Calibrating unimodality. To implement the first way of improving 
performance, let qu(n, a, F, 1, c?ko) denote the a-quantile of the Kolomogorov 
distance of the closest unimodal distribution (given by the taut string) to 
the empirical distribution F n of n i.i.d. random variables with common dis- 
tribution F. We have the following theorem. 

Theorem 3.3. Let X\(F), . . . ,X n (F) be an i.i.d. sample with common 
unimodal distribution F and empirical distribution F n . Let s n be the deriva- 



Table 1 

The procedure using the 0.9-quantile of the Kolmogorov metric. The numbers give the 
percentage of simulations in which the correct modality was obtained. The numbers in 
parentheses give the mean absolute deviation from the correct modality. The results are 

based on 1000 simulations 



Dist. 


u 


s 


Nl 


N2 


N4 


JV5 


iV10_5 


iV10_10 


100 


100 (0) 


100 (0) 


100 (0) 


0(1) 


(2.34) 


0(4) 


0(9) 


0(9) 


500 


100 (0) 


100 (0) 


100 (0) 


0(1) 


0(2) 


0(4) 


0(9) 


(8.6) 


1000 


100 (0) 


100 (0) 


100 (0) 


0(1) 


0(2) 


0(4) 


(9) 


(7.9) 


5000 


100 (0) 


100 (0) 


100 (0) 


50 (0.5) 


0(2) 


0(4) 


(8.3) 


100 (0) 


10000 


100 (0) 


100 (0) 


100 (0) 


100 (0) 


0(2) 


66 (0.4) 


99 (0.01) 


100 (0) 
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tive of the string S n through the tube T(F n , qu(n, a, F, 1, g?ko))- Then 

(3.6) F(s n unimodal) = a. 

Clearly 

qu(n, a, F, 1, d KO ) < qu(n, a, d K o), 
but it is not clear whether 

sup qu(ra, a, F, 1, g?ko) = qu(n, a, c?ko)- 

F unimodal 

We point out that the uniform distribution does not maximize qu(n, a, F, 1, oIko) 
[Hartigan and Hartigan (1985)]. We now take F = U to be the uniform dis- 
tribution on the basis that it is not an adequate approximation for most 
data sets and set a = 0.5. This means that on uniform test beds the modality 
will be correctly determined with probability 0.5. The uniform distribution 
has the advantage that the asymptotics of the quantiles qu(n, a, U, 1, cIko) 
can be calculated. We have 

(3.7) lim y/nqu(n, a, U, 1, <iKo) = qu(a, Bq), 

n — too 

where qu(a, Bq) denotes the a-quantile of the random variable 

(3.8) minsup \Bq(x) — H(x)\, 

H x 

where the function H : [0, 1] — » M. is convex on [0,ijf] and concave on 1] 
for some if/,0 < tn < 1- Simulations show that the 0.5-quantile of (3.8) is 
0.432. A correction for finite n gives 

qu(n, 0.5, U, 1, d KO ) = 0.43/v 7 ^ - 0.64/ra, 

with a percentage error (based on simulations) of at most 0.0045. Table 2 
shows the results. We see that the performance for the Gaussian test bed 
is hardly impaired. On the claw test bed we note that the performance for 
n = 1000 is now comparable to that of the simple Kolmogorov quantile for 
n = 10000. 

If we apply the same idea to the normal distribution, then heuristic ar- 
guments indicate that 

lim ^/nqu(n, a, 1), 1, c^ko) = 

n — too 

but we have no exact asymptotic result. The same argument goes through 
for any sufficiently smooth density. If true, this implies that if we use a cut- 
off point for the size of the Kolmogorov ball which is bounded below by some 
constant multiple of l/i/n, then the modality will be consistently estimated. 
We do not pursue this idea any further. 
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Table 2 

The procedure based on the 0.5-quantile of the Kolmogorov distance of the closest unimodal 
distri- bution to a uniform sample. The numbers give to the nearest integer the percentage of 
simulations in which the correct modality was obtained. The numbers in parentheses give the 

mean absolute deviation from the correct modality correct to one decimal place. The results 

are based on 1000 simulations 



Dist. 


s 


Nl 


N2 


N4 


iV5 


iV10_5 


TVTO.IO 


100 


100 (0) 


100 (0) 


22 (0.8) 


0(2) 


(3.8) 


0(8) 


(3.8) 


500 


100 (0) 


100 (0) 


78 (0.2) 


0(2) 


1 (2.5) 


(5.5) 


1 (2.5) 


1000 


100 (0) 


100 (0) 


95 (0) 


0(2) 


43 (0.7) 


27 (1.1) 


43 (0.7) 


5000 


100 (0) 


100 (0) 


99 (0) 


48 (0.6) 


100 (0) 


100 (0) 


100 (0) 


10000 


100 (0) 


100 (0) 


100 (0) 


100 (0) 


100 (0) 


100 (0) 


100 (0) 



3.6. Kuiper metrics. Suppose that the density s n of the taut string is 
unimodal. Part of the description of the taut string S n given in Section 3.2 is 
that it switches from the upper bound to the lower bound at each maximum. 
Consider now the Kuiper metric cZku defined by 

(3.9) d KV (F,G)= S up{a <b:\(F(b) - F(o)) - (G(b) - G(a))\}. 

It follows from the above that if dKo(E n ,S n ) =e and s n is unimodal, 
then dK\j(E n , S n ) = 2e. The a-quantile qu(n, a, c2ku) of dKu(F n ,F) is in- 
dependent of F for continuous F and is less than twice the a-quantile of 
dKo(F n , F). This suggests that the Kuiper metric is more appropriate for 
unimodality than the Kolmogorov metric. To demonstrate this we firstly 
define the Kuiper problem. 

Problem 3.1 (Kuiper density problem). Determine the smallest inte- 
ger k n for which there exists a density f n with k n modes and whose distri- 
bution F n satisfies 

d K \j(E n ,F n ) < qu(n, a, d KV ). 

Suppose now that F n is a unimodal distribution which solves the Kuiper 
density problem. Let 

e\ = max{x : F n (x) — E n (x)} and 

e 2 = max{x : G(x) - F n (x)}. 

As d\^jj{E n ,F n ) = s\ + E2 = qu(n, a, (2ku)) it follows by shifting F n by an 
amount 5 1 £ 2 — | that the solution of the Kolmogorov problem with e = |qu(n, a, g?ku) 
is also unimodal. As ^qu(n, a, cZku) < qu(n,a,dKo)) this implies that if 
the solution of the Kuiper density problem for a given a is unimodal, so 
is the solution of the Kolmogorov problem for the same a. 
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To cover the case of multimodality we define the Kuiper metric d^u °f 
order k by 

(3.10) dfa(F,ty=m^fc\(F(b j )-F(a j )) - {G{b 3 ) - G{a 3 ))\ j, 

where the maximum is taken over all aj,bj with 

a± < h < a 2 < b 2 < ■ ■ ■ < a K < b K . 

Again the distribution of d^(F n ,F) is independent of F for continuous F. 
If we denote the a-quantile by qu(n, a, c^u)' we can formulate the K-Kuiper 
problem. 

Problem 3.2 (/t-Kuiper density problem). Determine the smallest in- 
teger k n for which there exists a density f n with k n modes and whose dis- 
tribution F n satisfies 

4 u ( J E n ,F n )<qu(n,a,^ u ). 

If the density s n of the taut string has k modes, then for the Kuiper 
metric d^J 1 of order 2k — 1 we have 

d$ J J 1 (E Tn ,S n ) = (2k-l)e. 

This follows on noting that the string switches boundaries at each of the k 
local maxima of s n and also at the k—1 local minima. As 

qu(n, a, d^ 1 ) < (2k - 1) qu(n, a, d K o), 

this indicates that the Kuiper metric d^J 1 is more efficacious when the data ex- 
hibit k modes. We have no simple algorithm for solving the K-Kuiper prob- 
lem so we use the strategy of Davies and Kovac (2001) and decrease the ra- 
dius e of the Kolmogorov tube gradually until 

For large n approximations to qu(n, a, e^u) are available using the weak 
convergence result 

V^d^(F n ,F) => maxj^ |fl (&i) - ^o(oj)l|, 

where Bq denotes the standard Brownian bridge on [0, 1] and 

ai < b\ < a 2 < b 2 < ■ ■ ■ < a K < b K . 

The distribution of m&x{\Bo(b) — Bo(a)\} corresponding to the unimodal 
case k = 1 is known [e.g., Dudley (1989), Proposition 12.3.4.]. Sufficiently 
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accurate quantiles for finite n and for the other asymptotic cases may be ob- 
tained by simulations. Best results are obtained if k is related to the modality 
k of the test bed by k = 2k — 1. In practice a default value of k is required 
and we use k = 19. 

We combine the K-Kuiper metric with the ideas of Section 3.5. Let qu(n, a, F, 1, d^u) 
denote the a-quantile of the K-Kuiper distance of the closest unimodal dis- 
tribution to the empirical distribution F n of n i.i.d. random variables with 
common distribution F. We use the string S n as the closest unimodal distri- 
bution. If F is the uniform distribution of [0, 1], then we have again a 1/y/n 
asymptotic. For example, for k = 19 and a = 0.5, simulations showed that 

qu(n, 0.5, U, 1, d$ v ) ~ 8.12/Vn - 30.32/n L04 

is a good approximation. 

The results shown in Table 3 confirm the claim that the Kuiper metric 
with n = 2k — l performs best on test beds with k modes. Thus the procedure 
based on the d^jj metric is best for the bimodal distribution N2, that based 
on the dq^u metric is best for the five- modal claw density iV5, while that 
based on the d^jj metric is best for the two ten-modal distributions iV10_5 
and iVT0_10. None of the procedures performs well for the four-modal NA 
distribution. This is because it has two very concentrated but lower power 
peaks situated at the points 8 and 9. For this distribution global squeezing of 
the Kolmogorov tube will only work for large sample sizes. In small samples 
when the tube is sufficiently narrow to pick up the lower power peaks, it 
will have already caused peaks to appear at other points. This is shown by 
Table 4. For the sample sizes shown the tube was squeezed to give just four 
peaks and it was then checked if the four peaks were the correct ones. Table 4 



Table 3 

Results for the procedures using the 0.5-quantile of the closest unimodal distribution in 
the Kuiper metrics based on 3, 9 and 19 intervals. The numbers give the percentage 
of simulations in which the correct modality was obtained. The numbers in parentheses give 
the mean absolute deviation from the correct modality. The results are based on 1000 
simulations with sample sizes of 250 and 500 



Dist. 


S 


Nl 


N2 


N4 


N5 


AT10_5 


JVIO.IO 


n = 


250 
















k = 


3 


99 (0) 


96 (0) 


67 (0.3) 


0(2) 


(2.9) 


(6.7) 


38 (0.8) 


k = 


9 


100 (0) 


99 (0) 


59 (0.4) 


(1.9) 


20 (1.5) 


(3.4) 


95 (0) 


k = 


19 


100 (0) 


96 (0) 


53 (0.5) 


1 (1-9) 


20 (1.5) 


(1.0) 


99 (0) 


n — 


500 
















k = 


3 


99 (0) 


99 (0) 


90 (0.1) 


0(2) 


10 (1.7) 


(3.9) 


100 (0) 


k = 


9 


100 (0) 


99 (0) 


74 (0.3) 


1 (1-9) 


70 (0.3) 


50 (0.6) 


100 (0) 


k = 


19 


100 (0) 


99 (0) 


66 (0.3) 


2 (1.9) 


57 (0.5) 


97 (0) 


100 (0) 
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Table 4 

Results of global squeezing for the four-modal distribution N4. The 
Kolmogorov tube was squeezed to give exactly four peaks. The 
numbers give the percentage of simulations in which these were 
the correct peaks. The results are based on 1000 simulations 



n 


500 


1000 


2000 


4000 




3 


23 


81 


99 



gives the percentage of cases when this was the case. Thus even for a sample 
of size 2000, the correct peaks were only found in 80% of the cases. The 
problem is related to that of detecting low power peaks in nonparametric 
regression. In Davies and Kovac (2001) the problem was solved using local 
squeezing. In Section 5 we introduce a form of local squeezing for densities 
which is based on cell occupancy frequencies. 

3.7. Discrete data. So far we have looked for an approximation to the data in 
the form of a Lebesgue density. However, at little cost we can extend the method- 
ology to integer- valued data which typically arise from counts. Suppose 
the data set x n = (xi, . . . , x n ) contains only N different values t\ < < ■ ■ ■ < 
tjv- We look for an approximation in terms of N probabilities pj = ¥(X = tj), 
j = 1, . . . , N, where the random variable X has support t\ < £2 < ■ • • < 
Let ex, . . . , ejy be the empirical frequencies of the tj in the data and consider 
the cumulative sums 

3 

Ej =Y, e i 

i=l 

and the tube constructed by linear interpolation of the points (j/N,Ej), 
j = 0, . . . , N. Differentiating yields an approximation of p±, . . . ,pn- This pro- 
cedure corresponds to the taut string algorithm in the regression context 
[Davies and Kovac (2001)] with time points t\, . . . ,t n and with observations 
ei, . . . , e n . Our default procedure uses the K-Kuiper metric with k = 9 and 
a = 0.5. We point out that this radius is conservative for discrete data, 
but we do not pursue this any further. Other forms of approximation can 
be accommodated without much difficulty. An example is shown in Figure 5 
where the discrete taut string method was applied to 1200 observations from 
a mixture of three Poisson distributions, 

0.25P(2) + 0.5P(8) + 0.25P(21). 

The other situation is where repeated values occur not because of the na- 
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Poisson Mixture Discrete Taut String Approximation 




i 1 1 1 1 1 1 1 — n 1 1 1 1 1 1 r 

5 10 15 20 25 30 35 5 10 15 20 25 30 35 



Fig. 5. Discrete data. The left panel shows the density function of the mixture of three 
Poisson distributions and the frequencies of a sample of size 1200. The discrete taut string 
approximation is shown in the right panel. 



ture of the data (counting) but because of rounding. The rounding of data is 
very common and it can cause difficulties when looking for an approxima- 
tion based on Lebesgue densities. To see the difficulties assume that some 
data point x is observed k times. Depending on the exact implementation of 
the taut string algorithm, two problems may occur. If the tube is centered 
around the empirical distribution function and the tube width is smaller than 
k/2n, the derivative of the taut string at x will be oo. If, on the other hand, 
the tube is constructed by linear interpolation of the empirical distribution 
function, then the empirical mass at x of fc/n is spread over the interval 
where xi is the largest data point smaller than x. To overcome these 
problems we propose the following. Let e be the precision or cut-off error 
which we set to e = 10 _3 MAD(x n ), where MAD denotes the median absolute 
deviation. We construct a modified data set x±, . . . ,x n , where the identical 
observations at x are equally spread over the interval [x — e/2,x + e/2]. To 
be precise, we replace = x (j+2) = • • • = by 

( 1 1 

X ^ = X + £ {-2 + 2k + —) 

for i = 1, . . . , k. The taut string method described above is then applied to 
x instead of x. 
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4. Asymptotics on test beds. The asymptotic behavior of the taut string 
may be analyzed on appropriate test beds. It turns out that asymptotically 
the modality is correctly estimated and that the optimal rate of conver- 
gence is attained except in small intervals containing the local extremes of 
the density /. 

We denote the modality of the derivative of the taut string in the supre- 
mum tube T(F n ,C/y/n) by The taut string based on the radius Cj^fn 
will be denoted by with derivative s~\ We write If{n, C),l<i< k% , for 
the intervals where s„ attains its local extreme values and denote the mid- 
points of these intervals by t?(n,C), 1 < i < fe„ . The length of an interval / 
will be denoted by |/|. 

Theorem 4.1. Let f be a k-modal density function on R such that 

min \F(x) - G(x)\ > 0. 

g,(k— l)-modal 

Then we have, for all 5 > 0, 

lim liminfpf {}£ = k} n ( max \If(n,C)\ < s\ 

C'^oo n^oo \ ll<i<kg J 

n( max \tf(n,C)-t e A <s\) =1. 

In the following A denotes a generic constant which depends only on / 
and whose value may differ from appearance to appearance. 

Theorem 4.2. Assume that 

(i) / has a compact support on [0,1], 

(ii) / has exactly k local extreme values at the points < t\ < ■ ■ ■ < t| < 1, 
(hi) / has a bounded second derivative f^ which is nonzero at the k local 

extremes, 

(iv) /( 1 )(t)=0 only forte {tf,...,t e k }. 

Then the following statements hold: 
(a) 

lim liminfP(t. e G If(n,C),l < i < k) = 1. 

C^oo n^oo 

(b) For every C\ < 6 and C2 > 12, 

JimJinnnfP(Vf (n,C)\ ( V ^ l/ ^ (tf)l ) ^ G [C\ /Z , C 2 1/3 ] , !<<<*)=!. 
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c) Let be the knots of the taut string S% and denote 

m(n, C) = max j$S - if : ^° , € (0, 1)\ [J if (n, C) j . 
For some constant A only depending on f , we have 

1 /3 

lim HminfP (m(n,C) < ( A\f^\ Xj )\~ 2/3 (^^j )) = 1. 



(d) Denote 

M(n,C) 



logrA 1 / 3 /lograN x / 3 



n / V n 

Then for some constant A only depending on f , we have 

<&^ p Ufei««)-Jfwi^(^wi I/ '( !! F) I/ *))= 1 - 

(e) For some constants A\ and A2 only depending on f , we have 

lim liminf pf max I / (t) - f£(t)\ < AC 2/3 n~ l /A = 1. 

C ^ n-,00 ViGU^f(n,C)' W " W ' - J 

Part (d) of the theorem shows that, bounded away from the local ex- 
trema, the taut string density attains the optimal rate of convergence up to 
a logarithmic factor. The proofs follow the lines of Davies and Kovac (2001) 
and we omit them. 



5. Cell occupancy frequencies and local squeezing. 



5.1. Cell occupancy frequencies. The multiresolution procedure of Davies 
and Kovac (2001) is based on comparing the residuals of some regression 
function with those of Gaussian white noise. The comparison is based on 
the means on intervals which form a multiresolution scheme. A similar 
idea can be applied to the density problem. A distribution F is an ade- 
quate model for the data x n = (x±, . . . , x n ) if the transformed data 

u n = F(x n ) = (F(x 1 ),...,F(x n )) 

looks like an i.i.d. sample of size n from the uniform distribution on [0, 1]. 
This is done by comparing the frequencies 

w] k = \{l: k2' j <u l <(k + l)2~ j }\, < k < 2*, 1 < j < m, 

with those to be expected from i.i.d. uniform random variables. The maxi- 
mum resolution level m is taken to be the smallest integer such that n < 2 m . 
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Suppose that Ui,...,U n are independently and uniformly distributed on 
[0,1]. Then 

Wp k = \{l:k2~3 <Ui<(k + 1)2^}\ 
is binomially distributed b(n,2~ 3 ). For given a we define the upper bounds 

(5.1) v^(a) = mm\v:¥(Zf>v)<-^\, 

J I 2n J 

where Z™ satisfies the binomial distribution b(n, 2~- J ). It follows that 
F(W? k < vj(a), 1 < k < 2 j , 1 < j < n) > a. 

Lower bounds A™ fc (a) can be derived similarly. This gives rise to the following 
problem. 

Problem 5.1 (Cell occupancy problem). Determine the smallest integer 
k n for which there exists a density f n with k n modes and whose distribu- 
tion F n is such that the cell frequencies w™ k satisfy 

(5.2) \](a)<wl k <v?(a), 
where the v™ k (a) are given by (5.1). 

Although the cell occupancy problem is well defined, there is no obvious 
connection between the modality of the density f n and the set of inequal- 
ities (5.2). We therefore again adopt the strategy of producing test densi- 
ties derived from the taut string and gradually increase the modality until 
the inequalities (5.2) hold. The knowledge of which inequalities fail to hold 
provides further information which we are able to exploit as described in 
the next section. 

5.2. Local squeezing. Local squeezing is described in Davies and Kovac 
(2001). We apply it to the density problem as follows. Suppose that one of 
the inequalities of (5.2) fails. We suppose that 

wl k = \{l: k2~i < F n { Xl ) <{k + 1)2-'}| > v* k (a). 

Clearly, there exists an interval [xnu,xn2)] such that k2~ J < F n (xi) < (k + 
1)2~ J for all points X[ in [xm),X(i2)]- We now squeeze the tube locally on 
this interval and do this for all intervals where the upper inequality fails. For 
coefficients Wj tk we proceed similarly but use slightly larger intervals such 
that k2~ J < F n (xi) <{k+ 1)2~ J for all points xi in (xnu, xi^yj. The general 
procedure for doing this is as follows. First, a suitable initial global tube 
radius 70 is chosen using the Kolmogorov or generalized Kuiper metrics and 
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the taut string is calculated. If all the inequalities (5.2) hold, the procedure 
terminates. If not, we reduce the radius by a factor p, < p < 1, on all 
intervals where an inequality fails. Typical choices for p are 0.9 or 0.95. 
The taut string through the modified tube is calculated and, using this new 
test distribution, it is checked whether the inequalities (5.2) hold. If so, 
the procedure terminates. Otherwise the tube radius is again decreased by 
the factor p on all intervals where an inequality fails. This is continued until 
all the inequalities are satisfied. 

It is not easy to analyze the behavior of the local squeezing procedure. In 
the case of nonparametric regression Davies and Kovac (2001) give a heuris- 
tic argument indicating that the procedure improves the behavior at local 
extremes. A similar argument can be given for densities, but as it remains 
heuristic we omit it. 

The ability of the local squeezing method to detect low power peaks [see 
Davies and Kovac (2001)] is shown by the following example. The data con- 
sist of a sample of size 1000 drawn from the four-normal distribution iV4 
of Section 3.1. The density is shown in the upper left corner of Figure 6. It 
exhibits a main peak, a moderate peak on the right and in the center two 
low power but very concentrated and very close peaks. 

The upper right panel shows a kernel estimate which was calculated us- 
ing a Gaussian kernel. The mode on the right-hand side was detected, but 
is considerably broader than the normal component of the original density 
function. The main component is well captured but there are three super- 
fluous peaks. Finally, the two sharp peaks in the center of the data result in 
one flat local maximum. The lower left panel shows the result with the taut 
string method and two global tube radii. The solid line is derived from 
the d^u metric. There are no spurious local extremes but the small central 
peaks are not detected. The dashed line shows that further global squeezing 
would only lead to additional spurious modes on the left before the central 
peaks are detected. Finally, the lower right panel shows the result of local 
squeezing. The number and locations of the local extrema are estimated 
correctly and the difference with respect to the original density function is 
very small. 

Table 5 shows the performance of the local squeezing procedure for the dis- 
tribtions S, Nl, N2, N4, N5, iV10_5 and iV10_10 for samples of sizes 250 
and 500. The procedure was calibrated as for the Kuiper metrics but, due 
to the discrete nature of the cell counts, it was not possible to adjust the pa- 
rameters so that in 50% of the cases the modality for uniform samples was 
one. The choice lay between 48% and 55% and we took the latter. The 
results show a much enhanced performance for the distribution N4, but 
the results for the other distributions are worse than for the Kuiper metrics. 
This suggests a compromise procedure. 
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Table 5 

Results for the local squeezing procedure. The numbers give the percentage of simulations in 
which the correct modality was obtained. The numbers in parentheses give the mean absolute 
deviation from the correct modality. The results are based on 5000 simulations with sample 

sizes of 250, 500 and 1000 



Dist. 


S 


Nl 


N2 


N4 


N5 


iV10_5 


JV1CL10 


n = 250 
n = 500 
n = 1000 


91 (0.1) 
89 (0.1) 
88 (0.1) 


83 (0.2) 
80 (0.2) 
79 (0.3) 


42 (0.6) 
45 (0.6) 
54 (0.5) 


1 (1.6) 
22 (0.9) 
75 (0.3) 


4 (2.2) 
17 (1.5) 
43 (0.8) 


2 (2.9) 
36 (0.9) 
91 (0.1) 


99 (0) 
100 (0) 
100 (0) 



5.3. Compromise default procedures. Statistical procedures make no as- 
sumptions about the data [Tukey (1993a)] and consequently are required to 
be compromises [see Tukey's example of the milk bottle in Tukey (1993b)]. 
Given a Kuiper metric ^ku> we calibrate the procedure based upon it so 
that in 60% of the cases the approximation to uniform samples is unimodal. 
Local squeezing is then applied so that the final approximation is unimodal 
in 50% of the cases. Again due to the discrete nature of the cell counts, 50% 

Histogram of locsqex2 



m - 




-10 -5 5 10 15 -10 -5 5 10 15 

locsqex2 



Histogram of locsqex2 Histogram of locsqex2 




locsqex2 locsqex2 

Fig. 6. Local squeezing. The upper left panel shows the density of AM. A kernel estimate 
is shown in the upper right panel. The lower left panel illustrates global squeezing first 
with a solid line using the Kolmogorov bounds and then with a dashed line the taut string 
density with four modes. The local squeezing estimate is depicted in the lower right panel. 
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Table 6 

Results for the compromise procedure based on d^ v . The numbers give the percentage of 
simulations in which the correct modality was obtained. The numbers in parentheses give 
the mean absolute deviation from the correct modality. The results are based on 1000 
simulations with sample sizes of 250, 500 and 1000 



Dist. 


S 


Nl 


N2 


N4 


N5 


JV1CL5 


iV10_10 


n = 250 
n = 500 
n = 1000 


97 (0) 
97 (0) 
99 (0) 


93 (0.1) 

94 (0.1) 
98 (0) 


51 (0.5) 
64 (0.4) 
86 (0.1) 


2 (1.8) 
19 (1.1) 
82 (0.2) 


17 (1.6) 
60 (0.5) 
99 (0) 


40 (0.9) 
95 (0) 
100 (0) 


99 (0) 
100 (0) 
100 (0) 



is not exactly attainable so we take the smallest percentage higher than 50. 
A second choice is to standardize the Kuiper procedure so that in 95% of 
the cases the approximation to uniform samples is unimodal. This is then 
reduced to 90% using local squeezing. We modify the local squeezing pro- 
cedure as follows. Instead of using a multiresolution scheme we consider all 
intervals of length at most y/n. This results in a procedure of 0(n ) but 
easily calculable for sample sizes of 50,000 and more. The reasoning behind 
this alteration is that we use local squeezing only to detect low power con- 
centrated peaks. The others should be detected by the preceding Kuiper 
procedure. For reasons of space and comprehensibility we do not give an 
exact description of the local squeezing procedure but the source code is 
available from our web site. This leaves open the choice of k in c^u- The 
software is available for all choices k = 1, 3, . . . , 19 with the default choice 
k = 19. If data is to be analyzed in a routine manner, k can be chosen on 
the basis of experience or knowledge of the data involved. 

5.4. Further simulations. We now evaluate the two procedures COMPKU19_50 
and COMPKU19_90 which are the compromise procedures described in 
the previous section using the Kuiper metric d^j and calibrated at the uni- 
form distribution to give the correct modality with probabilities 0.5 and 0.9, 
respectively. We compare them with two kernel-based methods. The first 
KERNCV uses likelihood cross-validation for the choice of the bandwidth, 
while the second KERNSJ uses the Sheather-Jones plug-in bandwidths. The 
comparisons are performed using the ten densities shown in Figure 7. They 
are taken from Marron and Wand (1992) and are the uniform distribution on 
[0,1], the Gaussian distribution and eight mixtures of normal distributions. 

Each method was applied to 1000 samples of each of the densities and 
three different sample sizes (100, 500, 2000). For each estimate it was checked 
if the correct number of modes was found and if the positions of the modes 
corresponded to those of the densities. Table 7 shows how often the modes 
were determined correctly for the various densities and methods. Some com- 
ments are in order. First, if we use the procedure COMPKU5_50 which is 
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Skewed bimodal 




Smooth comb 



Discrete comb 





7. Ten densities that were used in a simulation study. 
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tuned to three modes, then the performance for the trimodal density im- 
proves. For n = 500 three modal values are found in 20% of the cases, and 
for n = 2000 this rises to 37%. Second, all the densities are mixtures of 
a small number of Gaussian distributions with the exception of the uniform 
density for which the kernel methods based on a Gaussian kernel fail. The 
trimodal distribution is the one where the kernel methods perform clearly 
better than the taut string method. If, however, the central Gaussian distri- 
bution is replaced by a uniform distribution, then the kernel methods again 
fail. We refer to Hartigan (2000) for an explanation of this. It indicates that 
the comparison is weighted in favor of the kernel methods as both they and 
the densities are based on the Gaussian kernel. We note that the performance 
of the kernel methods seems to deteriorate with increasing sample size. 

6. Hidden periodicities, spectral densities and taut strings. 

6.1. Hidden periodicities. The second problem we consider is that of de- 
tecting hidden periodicities in a data set x n . One method of formulating 
the problem is the following: calculate an appropriate spectral density func- 
tion f n and identify the hidden periodicities in the data with the peaks of 
f n [Brillinger (1981), Priestley (1981) and Brockwell and Davis (1987), and 
the references given therein]. 

Existing methods by and large belong to one of two different categories 
of procedures. The first is nonparametric and uses some form of smoothing 
of the periodogram. This may take the form of kernel estimators or splines 
or wavelets or averages of periodograms obtained by splitting the data into 
blocks [see Brillinger (1981), Chapter 5, Neumann (1996) and the references 
given therein]. The second possibility is to model the data by an autore- 
gressive process whose order is determined using some criterion such as AIC 
[Akaike (1977)], BIC [Akaike (1978)] or HQ [Hannan and Quinn (1979)]. The 
spectral density associated with the autoregressive process is then used to de- 
termine the hidden periodicities. None of these methods controls the number 
of peaks directly although the problem of hidden peaks is one of modality. 

Before proceeding, we assume that the data have been normalized to have 
sample mean zero and variance 1. To ease the notation the transformed 
data will also be denoted by x n . In the context of time series e n will denote 
the empirical spectral density or the periodogram defined by 

n 2 

^2x t exp(iiv t) , 

t=i 
(6.1) 

< oj < 2ir. 

The corresponding empirical spectral distribution function E n is given by 

(6.2) E n (u)= {" e n (\)d\. 

Jo 



e n (cj) 



2im 
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Table 7 

Correctly detected modes in samples of various densities and for several 
automatic methods 



Density 


Size 


KERNCV 


KERNSJ 


COMPKU19.50 


COMPKU19.90 


Uniform 


100 


1 


16 


50 


91 




500 





1 


53 


89 




2000 








53 


91 


Gaussian 


100 


77 


79 


85 


98 




500 


79 


78 


95 


99 




2000 


74 


59 


98 


99 


Strongly skewed 


100 


4 





90 


99 




500 


1 





96 


100 




2000 








99 


99 


Outlier 


100 


15 





90 


99 




500 








97 


100 




2000 








98 


100 


Bimodal 


100 


71 


81 


45 


14 




500 


75 


84 


68 


33 




2000 


75 


7:5 


97 


92 


Skewed bimodal 


100 


32 


46 


.34 


9 




500 


45 


37 


35 


13 




2000 


34 


12 


49 


22 


Trimodal 


100 


29 


12 


11 


1 




500 


57 


67 


11 


2 




2000 


81 


82 


20 


6 


Claw 


100 


1 





4 







500 


2 


2 


63 


34 




2000 








100 


100 


Smooth comb 


100 


18 





1 







500 


5 





5 


1 




2000 


1 


1 


89 


80 


Discrete comb 


100 


12 





1 







500 


2 





31 


13 




2000 





82 


98 


99 



The candidate spectral densities we use are based on the taut strings S n 
through the Kolmogorov tubes centered at E n . We assume that the taut 
string is constrained to go through (0,L n (0)) and (2tt, E n (2ir)) = (27r, 1), 
where L n denotes the lower boundary. 

One difference with respect to the i.i.d. model is the fact that the empirical 
spectral distribution function is defined for all u. In practice a grid must be 
chosen which, when analyzing the asymptotic behavior on test beds, becomes 
increasingly fine. We use the Fourier frequencies j = 0, . . . ,n — 1, where 
the data have been augmented by zeros to produce a power of 2. Choosing 
a finer grid has had no effect on the data sets we have analyzed so far. 
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Fig. 8. Sunspot data with number of peaks increasing from one to four. 



6.2. Data analysis. Just as in Section 3.3, it is possible to use the taut 
string as a data analytical tool. The radius of the Kolmogorov tube is grad- 
ually decreased and the resulting densities give information about the power 
and positions of the peaks. We give two examples. Figure 8 shows the first 
four peaks for the sunspot data [Anderson (1971)]. 

The second example is an artificial data set generated according to a scheme 
of Gardner (1988). Gardner does not explicitly specify the spectral density 
except that it has Gaussian shape with center frequency 2ir\ with A = 0.35. 
The density / of (6.3) approximates the graph shown in Gardner's Fig- 
ure 9.4(a) [Gardner (1988)]: 

(6.3) /H = i e - 300 ^- - 35 ) 2 . 

A realization of length 2048 was generated by filtering in the frequency 
domain. The following pure sine terms were added: 

\/2sin(27r(0.2t - 106/360)), 

\/2sin(27r(0.21t - 45.1/360)), 

v / 2/10sin(2vr(0.1t - 32.6/360)). 

A segment of length 256 starting at t = 1023 was taken as the simulated 
sample. It is shown in Figure 9. 
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Fig. 9. The Gardner data. 

A similar data set was analyzed by Gardner [(1988), Chapter 9.E, Ex- 
perimental Study] in an experimental study of the performance of different 
spectral estimates. Figure 10 shows the first four peaks (in a log scale) for 
the data set of Figure 9. Finally, Figure 11 shows the four-peak density 
together with the periodogram. 

6.3. Two concepts of approximation. The concepts of approximation used 
in the i.i.d. case had the advantage that the distributions involved were inde- 
pendent of the approximating model. This is no longer the case for stationary 
models. Furthermore, specifying the spectral distribution function F does 
not specify the joint distribution of the stationary sequence. If, however, 
one is prepared to accept a Gaussian model, then the distribution Pp of 
the sequence is determined by F. In analogy with the i.i.d. case we have the 
following. 

Problem 6.1 (Kuiper spectral density problem). Determine the small- 
est integer k n for which there exists a spectral density f n with k n modes 
and whose distribution F n satisfies 



(6.4) d KV (E n ,F n )<qu(n,a,F F n,d KV ), 

where Fpn denotes the distribution of the observations under the model. 
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Frequency 

Fig. 11. Gardner data with four peaks and the periodogram. 
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There are two disadvantages with the procedure based on this concept of 
approximation. One is that the quantile in (6.4) depends on F n . It would 
be possible to overcome this by using the taut string S n at each stage and 
then simulating the quantile qu(n, a, c£ku)- This is clearly very time 
consuming. The second disadvantage is the following. Under appropriate 
conditions [Dahlhaus (1988)] we have the weak convergence result 

v^(F n -F)=>Z, 

where F n denotes the empirical spectral distribution function of the model 
with spectral distribution function F and density / and Z denotes a con- 
tinuous zero-mean Gaussian process defined by 

/•min(Ai,A 2 ) 

(6.5) E(Z(Ai)Z(A 2 ))= / f(u>) 2 du. 

Jo 

It follows from (6.5) that any large peaks will swamp smaller peaks which 
may be present and so prevent their detection. The one advantage of (6.4) 
is that it allows an asymptotic evaluation. 

A more sensitive procedure is based on some kind of multiresolution anal- 
ysis. Suppose for the moment that the sample size n is a power of 2, n = 2 m . 
Given a spectral density function /, we define 

(6-6) SnC/» = ^, 

and consider the multiresolution scheme 

j2 k 

(6-7) Wjk (f)= 9n(f,Ul,n), 

i=(j-l)2*+l 

j = l,...,2 m - fc -\fc = 0,...,m-l, 

where the ui iU = 2irl/n are the Fourier frequencies. The class of stationary 
processes with spectral density function / is too large to provide a meaning- 
ful definition of approximation so we now restrict attention to Gaussian pro- 
cesses. Corresponding to level-dependent thresholds for wavelets, we specify 
lower and upper bounds Ij^n and itfc,n> respectively, for the multiresolution 
coefficients (6.7). These now define the following. 

Problem 6.2 (Multiresolution spectral density problem). Determine 
the smallest integer k n for which there exists a spectral density f n with 
k n modes such that 

(6.8) l k , n <Wjk{f n )< u k,n, j = l,...,2 m - k -\k = 0,...,m-l. 
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The default bounds we use are lk, n — qu(ai n ,2 fe ) and Uk, n = qu(a2n>2 fc ), 
where qu(/3, v) denotes the /3-quantile of the gamma distribution with v 
degrees of freedom, a\ n = (1 — a)/2n and ctin = 1 — «in with a = 0.9. The 
bounds are based on the Gaussian model and the asymptotic results for 
such processes as given, for example, by Theorem 5.2.6 of Brillinger (1981). If 
the asymptotic results hold precisely for finite n, then the bounds are chosen 
such that, for a stationary Gaussian process with spectral density function 
/, the inequalities (6.8) hold with probability at least 0.9 for f n = f. As 
the individual g n (f,uj) of (6.6) for uj = ^2 are asymptotically independent, 
the bounds will be approximately of the correct order, again for Gaussian 
processes with a spectral density function. The usefulness of the bounds for 
real data sets is an empirical matter. In particular, they will be too slack if 
the spectral distribution function contains point masses. 

This is the case for the Gardner data given above and may be seen in 
Figure 11. The absolutely continuous part of the spectrum shows a degree 
of noise, whereas the remainder of the spectrum is noise free. The default 
bounds we propose will detect the first peak but they are not sufficiently 
tight to split the two main peaks. On the other hand, if the bounds are 
sufficiently tight to separate the two peaks, then superfluous peaks will be 
produced in the absolutely continuous part of the spectrum. There would 
seem to be no easy solution which will work equally well for continuous as 
well as for discrete spectra. 

We have no algorithm to solve the problem as it stands so again we use 
the local squeezing variant of the taut string method. The string is squeezed 
locally on the intervals where (6.8) fails and this is continued until all the in- 
equalities are satisfied. When doing this, however, care must be taken re- 
garding the order in which the inequalities are treated. From the form of 
g n (f,u) in (6.6) it is clear that a particular g n (f,uj) can be very large and 
influence all intervals containing this particular frequency and this although 
the corresponding e n (to) is very small. Squeezing locally over all intervals 
affected by this frequency will often produce many superfluous peaks. 

To avoid this we consider the intervals in order of size commencing with 
intervals of size 1. When all the inequalities are satisfied we then move on to 
intervals of size 2 and continue in this manner until all the inequalities are 
satisfied. This is the default version of the algorithm. If global squeezing is 
used, then the peaks will be introduced according to their power and may 
be introduced on intervals where the inequalities (6.8) are satisfied. This is 
the case for the Gardner data. If the default version with local squeezing is 
used, the main peak is not split. If, however, global squeezing is used, then 
it is split. 

A practical problem which occasionally occurs is that the local squeezing 
version may find peaks of very small power which have no practical relevance. 
They may be removed by increasing the baseline of the empirical spectral 
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density by a small amount. The software does this by first adding a small 
proportion of the total power, or the mean empirical spectral density, to 
the empirical spectral density and then proceeding as before. 

6.4. Asymptotics on test beds. We indicate briefly the results of an asymp- 
totic analysis using the Kuiper concept of approximation. The test bed we 
consider is that of a stationary process X n (F), 1 < n < oo, with a spectral 
distribution function F and spectral density function / as follows. 



Test bed 6.1. 

(i) F has exactly k local extreme values on the interval (0, tt). 

(ii) F satisfies 

inf sup \F(u)-G(u)\>0, 

GeF(fc-l) w 6 [ 0)7r ] 

where J-{k — 1) denotes the set of distributions with at most k—1 local 
extreme values. 



To investigate the behavior of the taut string on the Test bed 6.1, we 
consider a tube of width 2C/y/n and denote the taut string through this 
tube by S n (C) with derivative s n (C) and modality k% . The intervals on 
which s n {C) takes on its local extreme values will be denoted by If(n,C), 
i = 1, . . . , k% , with midpoints u> \ (n, C). The first theorem shows that on Test 
bed 6.1 the number and locations of the local extreme values are determined 
in a consistent manner. 

Theorem 6.1. Consider the Test bed 6.1. Then for all 5 > 0, 

lim liminfpf {A£ = k} D I max \If(n,C)\ < s\ n { max \tf(n,C) - tf \ < 5 

C->oc n^oo \ l n [l<i<k J [l<i<k 1 l 

= 1. 



To obtain rates of convergence on appropriate test beds we must impose 
further conditions. 



Test bed 6.2. 

(i) All spectral densities f 3 of order j exist and sup w |/- ? (a;)| < B 3 for some 
constant B. 

(ii) The spectral density function / = f 2 has a continuous second derivative 
/(2 ). 

(iii) / has exactly k local extreme values, < uj\, . . . ,ujf. < 2ir, and / W (cj) 7^ 
for lo G [0, 2-7r] \ {wi, . . . , Wfc}. 
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(iv) f&)(<j j )?0,j = l,...,k. 

(v) The fourth-order spectral density is continuous. 

The above conditions correspond to (i) of Assumption 2.1 of Dahlhaus 
(1988). 

Rates of convergence require a modulus of continuity for the process Z n — 
y/n(F n — F), where F n denotes the empirical spectral distribution function 
of the sample (X\(F), . . . ,X n (F)). Under the conditions of Theorem 2.4 of 
Dahlhaus (1988), it follows that 

(6.9) sup \Z n (<j0 2 ) - Z n (oJi)\ < Cy/iv 2 -uji |log(w 2 - 

0<o;i <a;2 <2-7T,a;2 — u>i <S 

with probability tending to 1 as 8 tends to zero. From this it can be shown 
that the rate of uniform convergence away from the local extremes is 0(((logn) 2 /n) 1 / 3 ). 
This differs from the rate of convergence for the test beds considered in 
Davies and Kovac (2001) by an extra logn term. This is explained by the dif- 
ferent modulus of continuity. On the test beds of Davies and Kovac (2001) 
it is y/8 | log <5| , whereas above it is y8\ log <5| . 

6.5. Examples. The default version we use is the procedure deriving from 
the multiresolution problem with a = 1 — 0.1 /n and a squeezing factor of 
0.9. For the sunspot data the result is the one-peak density shown in the top 
left panel of Figure 8. For the Gardner data the result is the three-peak 




~i 1 1 1 1 1 r 
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Fig. 12. Log spectral densities of a sample of size 1024 generated by the scheme (6.10). 
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density derived from the four-peak density shown in the bottom right panel 
of Figure 10 but with the major peak not split (see above). Finally, we 
consider data generated according to a scheme of Neumann (1996), which is 
as follows: 

(6.10) X n = Y n + c Z n , 

where 

Y n + aiY n _i + a 2 Y n - 2 = b e n + b X £ n -\ + b 2 e n - 2 

and {e n },{Z n } are independent Gaussian white noise processes with vari- 
ance 1. Neumann chose the coefficient values as follows: a\ = 0.2, a 2 = 0.9, 
bo = 1, b\ = 0, b 2 = 1 and Co = 0.5. A sample of size 1024 was generated ac- 
cording to this scheme. Figure 12 shows the logarithm of the spectral density 
of the sequence together with the logarithm obtained from the default 

version of the taut string method. The two peaks are correctly identified. 
The wavelet method used by Neumann results in six peaks [Neumann (1996), 
Figure 2(b)] for the data set he considered. 

7. Proofs. 

7.1. Proof of Theorem 4.1. Using the Glivenko-Cantelli theorem, the prop- 
erty of the taut string of minimizing the modality in T(F n , -^=) and Propo- 
sition 12.3.3 of Dudley (1989) we see that 

min(P(A£ < fc),P(*£ > k)) > p(Vg T(V n ,-^=)) 

> 1 - exp(-2C 2 ) 



and conclude that 



lim lim F(k% = k) = l. 

C^oon^oo 



The other claims are proved similarly. 
7.2. Proof of Theorem 4.2. 

Proof of (a). Since the empirical process E n = v / n(F n — F) is tight, 
we conclude [Billingsley (1968), page 106] that 

lim lim f( sup \E n (s) - E n (t)\ < ^] = 1, 

where r n = max(t| — £*•), with tj denoting the point where / takes its jth 

le value and tj 
interval of /„. 



local extreme value and t j denoting the left endpoint of the jth local extreme 
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From Theorem 4.1 we deduce that, for C and n sufficiently large, has 
the correct modality and 

(7.1) sup \E n (s)-E n (t)\<- 

s<t<2r n t> 

with arbitrarily high probability. 

Suppose is initially convex and t\ < if. Then F~ is the largest convex 
minorant of F n + C / yjn [Barlow, Bartholomew, Bremner and Brunk (1972)] 
until it reaches the left endpoint t[(n,C) of If(n,C) = [t[(n,C),t 7 l (n,C)]. 

For some constant 6 > 0, for each C and sufficiently large n, 

t\ — = arg max H(h), 

0<h<8 V ' 

where 

(7.2) H(h) ~ + ft) ~ Fn( * fl) ~ 2C/ ^ . 

As -F is convex on [ii,if], it can be shown using Taylor expansions that 

F{t\ + h)-F(tl) 



(7-3) 

defines a strictly h 
more, for all r < fi 



h 

defines a strictly increasing function on [0, where \i = t\ —t\. Further- 



-H(r)>G[-u)-G(r)+ ' 



>0. 



This shows that H cannot attain its maximum on [0,//] and consequently 
t\ >t\. Similar arguments hold for the other extrema. □ 

Proof of (b). We suppose that S n has a local maximum on If (n, C) = 
[t[(n,C),tl(n,C)], that if £ If and that (7.1) is satisfied. Define G by 

G{h) = F(t[ + h)-F(t[)-2C/^i ^ 

and consider /i = argmaxo^/^,5 G(h). Then G'(h ) = implies 

2C 

/(ti + /io)/io = *X*i + h ) - F(t[) - —. 

\/n 



Using Taylor expansions in if and the fact that /'(if) =0, we obtain 
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In the other direction we consider 

.... , F(tf + h)-F(t e 1 )-2C/y/E 
(7.4) hi = are max — — — 

o<h<s h 

and 

F(q-h)-F(tl)-2C/y/n 
ho = are mm i — — . 

o<h<8 h 

It is not difficult to see that ho < hi + hi . Setting the derivative of the right- 
hand side of (7.4) to zero and using a Taylor expansion in t\ yields 

The same argument holds for hi as well and both together show that 

12C 

Define H as in (7.2) and consider 

2 



ho = arg max G(h) 



'Cnh 

The considerations above show that 



v/H/"(«f) J ~ " v v " " V v/S/"(ff 



Furthermore, considerations as in (a) show that G{x) — -7= defines a strictly 
decreasing function. Therefore, for all h > (1 + -^)ho, 

H{ho) - H(h) > G{ho) - G{h) - > 0. 

Consequently, H cannot attain its maximum in h > ho{\ + -7=) and hence 

1 \/ ^(C + l/VCJN 1 / 3 



arg max H(h) < 1 + 
Similarly, it can be shown that 

" s .?S fl, " ) * ( : " r+W) { J( %rw ) ■ ° 

Proof of (c). The proof relies on the modulus of continuity of the em- 
pirical process E n . 
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Lemma 7.1. Let Y(n,C) denote random variables such that, for all e > 



0. 



lim lim ¥(\Y(n,C)\ < e) = 1. 

C->oon->oo Vl v ' n ' 



Consider a n = n 7 /or some 7 < 1 and 

TTien for all B > 2 we /iai>e 

lim lim pf max \E n {s) - E n { ^ = 



Proof. Define random integer- valued variables K n by 



0° 



Using a result of Mason, Shorack and Wellner (1983), we conclude that 

3 C < i 



provided < i, 



|£ ra ( s )-£ n (t)| ^ p 

max > is 

a n <| S -t|</3£ VI* - »l log(l/|* - S|) 



< ^ P( |S n (a) - E n (t)\ > J3^|i - s| log( — — ) for some s,t with 



fc=0 



2 fc a„ < Is - 1\ < 2 k+1 a n \k < K n 



where we denote 2 fc+1 a n , by a^, 
and 



'log(l/a r 



^ (x) = 2 (l + x)(log(l + x)-l) + l 



x 1 - 



It is easily verified that tp( 7jk= ) ~~ ^ 1- Thus 
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Putting this together, we deduce that 



E n (s) — E n (t)\ > B^j\t — s\ log^y^ — for some s,t with 

a n < \s — t\ < f3 n 
201og(n) 3 

< „ 7 (B/2-l) ' 

This completes the proof of the lemma. □ 

We proceed now with the proof of (c). Since / is twice continuously dif- 
ferentiable, there is some constant D > such that 

\F(x + h)- F{x) - hf(x) - \h 2 f'{x)\ < Dh 3 

for all x and h. 

Let B be an arbitrary constant greater than 2 and 

d(n, C) = minj |/'(x)| \x G [0, 1]\ (J If (n, C) | . 

Define a random sequence h(n, C) by 

ln ' ' $n)VH{n,C) 2 l3 ' 

We consider the situation where 

(i) f£ attains the correct modality; 

(ii) tfeIf(n,C) for all i; 

(iii) the empirical process satisfies 



sup \E n (t)-E n (s)\<BJ\s-t\log 

\s-t\<Y(n,C) V 



\s-t\ 



where Y(n,C) is defined by 

Y(n,C) = max{xj_|_i — Xj \ Xj,Xj + ± knots, [xj,Xj + i] ^If(n,C) for all i}; 

(iv) for all xe [O^WUi I?(n,C), 

32D 

holds; 

(v) for each extreme interval If(n,C), the distances of each endpoint to if 
are both smaller than 4h„. 
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The preceding lemmas and parts of this theorem show that the probability 
that all these assumptions are satisfied simultaneously converges to 1 as 
n and C tend to oo. For example, (7.2) follows from (b) which provides 
a constant A > such that \f'{x)\ > ArT 1 ^ . 

Consider now an arbitrary point t\ G [0, 1] \ IJj If{n, C), where /'(ii) > 0. 
Then 



Plugging in the expression for h n and using the assumptions made above, 
we see that 

W' + «- f -w< /(ll ) + V (ll )/ 1+ U> 



h n ~ v y 2 " v ' V 4 4 

Similarly, we conclude that, for all h £ [4/i n ,^], 



>/(ii) + ^/'(ii)(l-^-^ 



where is the smallest local extreme value greater than t±. 

Suppose that there are knots Xj and that do not embrace a local 
extreme interval such that = Xj+i — Xj > 4:h n and such that / is increasing 
on The width h is the local argmin 

~ . F n ( Xl + h) - F n ( Xl ) 
a = arg mm . 

0<h<5 h 

On the other hand, the considerations above show that 

F n (xi + h n )-F n {xi) F{x 1 + h)-F n {x 1 ) 
K h 

Therefore, the distance between two knots that do not embrace an extreme 
interval is bounded by 4/i n . □ 

Proof of (d). We assume that all the assumptions made in the proof 
of (c) are again satisfied and that each two extreme intervals If and If, 1 
are separated by at least two additional knots Xj and Xj + \: 

max If < Xj < Xj + \ < min/? +1 . 

Define h n as in (7.4). Consider a knot X{ which does not delimit a local 
extreme interval If. We take / to be increasing in Xj. Then the proof of (c) 
shows that 

,C { , F n (xi + h n ) - F n (xi) 

In \ x i) — 



n 
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Similar arguments show that 



,C ( \ ^ F n (xj ) - F n (x,i - h n ) 

In \ x i) — 



K 

n 



>/(*,)- Cl \f 

Analogous inequalities can be derived in the case where / is decreasing in 
Suppose now that t is an arbitrary point in 



?2 / \ n 



\Jlt{n,C). 



i=l 

Let Xi be the nearest knot which does not delimit a local extreme interval. 
Then 

\f(t)-ti{t)\ 

(7.6) 

< \f(t) - f( Xi )\ + \f(xi) - fZ( Xl )\ + \fZ(x t ) - f£(x)\. 
The inequalities above show that the second term is bounded by 

1/3 



c 2 |/'(^)l 1/3 (^ 

V n 



C 3 |t-x i ||/'(x i )|<C73|/'(^)| 1 / 3 f^ 



The first term is bounded by 

n J 

This follows from (b). 

Depending on the exact definition of /„ (x) at knot points, the third term 
is either or bounded by 2C 1 |/'(x i )| 1 / 3 (^M) 1 /3. 

This completes the proof of (d). □ 

Proof of (e). As in the other cases, we assume that attains the cor- 
rect modality and that t\ G If(n,C) for each extreme point if. We also as- 
sume that, for each extreme interval If, 

6{C -l/v^n 173 



1 + VCJ\ vV"(*i) 
<\Ii{n,C)\ 

1 \( 12(C + l/v 
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The regression function takes in if the slope of the taut string in the ex- 
treme interval If = [x\,X2\- Taylor expansions in tf using /'(if) = and an 
application of the modulus of continuity for the empirical process E n as 
formulated in Lemma 7.1 yield 

\ti{tf) - m)\ <d 1 (i+ 

The proof is now completed by extending the bound to arbitrary points in 
extreme intervals If. This is done in the usual way as in (7.6) using a Taylor 
expansion in t\ and shows that 

\f(t)-f(t!)\<D 2 \If\ 2 f"(tl). □ 

Software. The software is available from our home page at www.stat- 
math.uni-essen.de. A package for R is in preparation. 

Acknowledgments. We thank two referees and an Associate Editor for 
their comments on the first version of the paper. The final version has prof- 
ited from their remarks and suggestions. 

Note added in proof. After acceptance of this paper for publication we 
found that a small change of the notion of adequacy for densities leads to a 
considerable improvement in the performance of the procedure. In particu- 
lar, 

• a calibration using the uniform density is not necessary; 

• a constant density is fitted for almost all samples of the uniform distribu- 
tion; 

• the peaks of densities such as the claw density are detected more reliably. 
We consider differences of Kuiper metrics (fj^u 

(*) Pi (F,F n )=d 1 KV (F,F n )-d^(F,F n ), i = l,...,K, 

where d^u = 0- The distribution of each difference pi(F,F n ) is independent 
of F. In our modified K-Kuiper problem we require all differences to be 
smaller than some a-quantile of pi with a close to 1. Our default is a = 0.999. 

Problem 3.2' (Modified K-Kuiper density problem). Determine the 
smallest integer k n for which there exists a density f n with k n modes and 
whose distribution F n satisfies 

Pi(E n ,F n ) < qa(n,a,pi) 

for all i = 1, . . . , k. 
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Table 8 

Results for the taut string procedure based on the modified K-Kuiper criterion using k19. 

The numbers give the percentage of simulations in which the correct modality was 
obtained. The numbers in parentheses give the mean absolute deviation from the correct 
modality. The results are based on 1000 simulations with sample sizes of 100, 250, 500 

and 1000 



Dist. 


u 


S 


Nl 


N2 


N4 


N5 


iV10_5 


AT10_10 


100 


99 (0.0) 


100 (0.0) 


98 (0.0) 


8 (1.2) 


(2.1) 


3 (3.8) 


23 (3.8) 


63 (0.4) 


250 


99 (0.0) 


100 (0.0) 


100 (0.0) 


18 (0.8) 


(2.0) 


23 (2.6) 


80 (0.2) 


91 (0.1) 


500 


98 (0.1) 


100 (0.0) 


100 (0.0) 


53 (0.5) 


1 (2.0) 


76 (0.4) 


94 (0.1) 


95 (0.1) 


1000 


98 (0.1) 


100 (0.0) 


100 (0.0) 


86 (0.1) 


3 (2.0) 


100 (0.0) 


98 (0.0) 


97 (0.0) 



As for the original K-Kuiper problem quantiles may again be obtained 
by simulation. For large n > 1000 the distribution of y/npi(F n ,F) can be 
approximated by the corresponding quantile of a Brownian bridge using the 
weak convergence of the empirical process. 

The taut string procedure can be initiated by using the global bandwidth 
e° = 0.5 which corresponds to a constant approximating density /°. If all 
inequalities (*) are satisfied for /° we are finished. Otherwise assume that 
i is the smallest index such that an inequality (*) is not satisfied. Then we 
set e 1 = 0.5 • q\i(n,a, pi) which is the largest tube width such that the i-th 
difference pi of K-Kuiper metrics is sufficiently small. After a few iterations 
all pi will satisfy (*) and the final approximation is the taut string approx- 
imation with the maximal global bandwidth and hence minimal number of 
modes which is adequate for the data. 

Table 8 shows that the proposed procedure returns a constant function 
for the uniform density in about 99 per cent of the cases independent of n. 
At the same time the 10 peaks of the A r 10_5-density are found in 23 % of the 
cases by the new procedure for samples of size 100 and in 80 % for samples 
of size 250. The old procedure never found the correct number of peaks for 
samples of size 250. Only the performance for the bimodal iV2-distribution 
has deteriorated. The small peaks of the iV4 distribution are only detected 
occasionally even for large sample sizes. 
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